Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 918 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 317.2 KiB |
| Average record size in memory | 353.8 B |
Variable types
| Numeric | 5 |
|---|---|
| Categorical | 6 |
| Boolean | 1 |
Age is highly correlated with FastingBS | High correlation |
RestingBP is highly correlated with FastingBS | High correlation |
Cholesterol is highly correlated with FastingBS and 1 other fields | High correlation |
FastingBS is highly correlated with Age and 2 other fields | High correlation |
MaxHR is highly correlated with HeartDisease | High correlation |
HeartDisease is highly correlated with Cholesterol and 1 other fields | High correlation |
HeartDisease is highly correlated with ST_Slope and 1 other fields | High correlation |
ST_Slope is highly correlated with HeartDisease | High correlation |
ChestPainType is highly correlated with HeartDisease | High correlation |
ChestPainType is highly correlated with ExerciseAngina and 1 other fields | High correlation |
MaxHR is highly correlated with ExerciseAngina and 1 other fields | High correlation |
ExerciseAngina is highly correlated with ChestPainType and 3 other fields | High correlation |
Oldpeak is highly correlated with ExerciseAngina and 2 other fields | High correlation |
ST_Slope is highly correlated with Oldpeak | High correlation |
HeartDisease is highly correlated with ChestPainType and 3 other fields | High correlation |
Cholesterol has 172 (18.7%) zeros | Zeros |
Oldpeak has 368 (40.1%) zeros | Zeros |
Reproduction
| Analysis started | 2021-11-14 14:09:40.650439 |
|---|---|
| Analysis finished | 2021-11-14 14:10:08.385259 |
| Duration | 27.73 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 50 |
|---|---|
| Distinct (%) | 5.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 53.51089325 |
| Minimum | 28 |
|---|---|
| Maximum | 77 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.3 KiB |
Quantile statistics
| Minimum | 28 |
|---|---|
| 5-th percentile | 37 |
| Q1 | 47 |
| median | 54 |
| Q3 | 60 |
| 95-th percentile | 68 |
| Maximum | 77 |
| Range | 49 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 9.432616507 |
|---|---|
| Coefficient of variation (CV) | 0.1762746973 |
| Kurtosis | -0.3861396124 |
| Mean | 53.51089325 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -0.1959330287 |
| Sum | 49123 |
| Variance | 88.97425416 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 54 | 51 | 5.6% |
| 58 | 42 | 4.6% |
| 55 | 41 | 4.5% |
| 56 | 38 | 4.1% |
| 57 | 38 | 4.1% |
| 52 | 36 | 3.9% |
| 51 | 35 | 3.8% |
| 59 | 35 | 3.8% |
| 62 | 35 | 3.8% |
| 53 | 33 | 3.6% |
| Other values (40) | 534 |
| Value | Count | Frequency (%) |
| 28 | 1 | 0.1% |
| 29 | 3 | 0.3% |
| 30 | 1 | 0.1% |
| 31 | 2 | 0.2% |
| 32 | 5 | |
| 33 | 2 | 0.2% |
| 34 | 7 | |
| 35 | 11 | |
| 36 | 6 | |
| 37 | 11 |
| Value | Count | Frequency (%) |
| 77 | 2 | 0.2% |
| 76 | 2 | 0.2% |
| 75 | 3 | 0.3% |
| 74 | 7 | |
| 73 | 1 | 0.1% |
| 72 | 4 | 0.4% |
| 71 | 5 | 0.5% |
| 70 | 7 | |
| 69 | 13 | |
| 68 | 10 |
Sex
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.1 KiB |
| M | |
|---|---|
| F |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | M |
|---|---|
| 2nd row | F |
| 3rd row | M |
| 4th row | F |
| 5th row | M |
Common Values
| Value | Count | Frequency (%) |
| M | 725 | |
| F | 193 | 21.0% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| m | 725 | |
| f | 193 | 21.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 53.9 KiB |
| ASY | |
|---|---|
| NAP | |
| ATA | |
| TA | 46 |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 2.949891068 |
| Min length | 2 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | ATA |
|---|---|
| 2nd row | NAP |
| 3rd row | ATA |
| 4th row | ASY |
| 5th row | NAP |
Common Values
| Value | Count | Frequency (%) |
| ASY | 496 | |
| NAP | 203 | |
| ATA | 173 | 18.8% |
| TA | 46 | 5.0% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| asy | 496 | |
| nap | 203 | |
| ata | 173 | 18.8% |
| ta | 46 | 5.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 67 |
|---|---|
| Distinct (%) | 7.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 132.3965142 |
| Minimum | 0 |
|---|---|
| Maximum | 200 |
| Zeros | 1 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 106 |
| Q1 | 120 |
| median | 130 |
| Q3 | 140 |
| 95-th percentile | 160 |
| Maximum | 200 |
| Range | 200 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 18.51415412 |
|---|---|
| Coefficient of variation (CV) | 0.1398386826 |
| Kurtosis | 3.271250917 |
| Mean | 132.3965142 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.1798393101 |
| Sum | 121540 |
| Variance | 342.7739028 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 120 | 132 | |
| 130 | 118 | 12.9% |
| 140 | 107 | 11.7% |
| 110 | 58 | 6.3% |
| 150 | 55 | 6.0% |
| 160 | 50 | 5.4% |
| 125 | 29 | 3.2% |
| 135 | 20 | 2.2% |
| 115 | 19 | 2.1% |
| 128 | 18 | 2.0% |
| Other values (57) | 312 |
| Value | Count | Frequency (%) |
| 0 | 1 | 0.1% |
| 80 | 1 | 0.1% |
| 92 | 1 | 0.1% |
| 94 | 2 | 0.2% |
| 95 | 6 | 0.7% |
| 96 | 1 | 0.1% |
| 98 | 1 | 0.1% |
| 100 | 15 | |
| 101 | 1 | 0.1% |
| 102 | 3 | 0.3% |
| Value | Count | Frequency (%) |
| 200 | 4 | 0.4% |
| 192 | 1 | 0.1% |
| 190 | 2 | 0.2% |
| 185 | 1 | 0.1% |
| 180 | 12 | |
| 178 | 3 | 0.3% |
| 174 | 1 | 0.1% |
| 172 | 2 | 0.2% |
| 170 | 14 | |
| 165 | 2 | 0.2% |
| Distinct | 222 |
|---|---|
| Distinct (%) | 24.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 198.7995643 |
| Minimum | 0 |
|---|---|
| Maximum | 603 |
| Zeros | 172 |
| Zeros (%) | 18.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 173.25 |
| median | 223 |
| Q3 | 267 |
| 95-th percentile | 331.3 |
| Maximum | 603 |
| Range | 603 |
| Interquartile range (IQR) | 93.75 |
Descriptive statistics
| Standard deviation | 109.3841446 |
|---|---|
| Coefficient of variation (CV) | 0.5502232611 |
| Kurtosis | 0.1182084685 |
| Mean | 198.7995643 |
| Median Absolute Deviation (MAD) | 46 |
| Skewness | -0.6100864307 |
| Sum | 182498 |
| Variance | 11964.89108 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 172 | 18.7% |
| 254 | 11 | 1.2% |
| 223 | 10 | 1.1% |
| 220 | 10 | 1.1% |
| 230 | 9 | 1.0% |
| 211 | 9 | 1.0% |
| 216 | 9 | 1.0% |
| 204 | 9 | 1.0% |
| 219 | 8 | 0.9% |
| 246 | 8 | 0.9% |
| Other values (212) | 663 |
| Value | Count | Frequency (%) |
| 0 | 172 | |
| 85 | 1 | 0.1% |
| 100 | 2 | 0.2% |
| 110 | 1 | 0.1% |
| 113 | 1 | 0.1% |
| 117 | 1 | 0.1% |
| 123 | 1 | 0.1% |
| 126 | 2 | 0.2% |
| 129 | 1 | 0.1% |
| 131 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 603 | 1 | |
| 564 | 1 | |
| 529 | 1 | |
| 518 | 1 | |
| 491 | 1 | |
| 468 | 1 | |
| 466 | 1 | |
| 458 | 1 | |
| 417 | 1 | |
| 412 | 1 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.1 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 704 | |
| 1 | 214 | 23.3% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 704 | |
| 1 | 214 | 23.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
RestingECG
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 55.4 KiB |
| Normal | |
|---|---|
| LVH | |
| ST |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 4.610021786 |
| Min length | 2 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Normal |
|---|---|
| 2nd row | Normal |
| 3rd row | ST |
| 4th row | Normal |
| 5th row | Normal |
Common Values
| Value | Count | Frequency (%) |
| Normal | 552 | |
| LVH | 188 | 20.5% |
| ST | 178 | 19.4% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| normal | 552 | |
| lvh | 188 | 20.5% |
| st | 178 | 19.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 119 |
|---|---|
| Distinct (%) | 13.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 136.8093682 |
| Minimum | 60 |
|---|---|
| Maximum | 202 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.3 KiB |
Quantile statistics
| Minimum | 60 |
|---|---|
| 5-th percentile | 96 |
| Q1 | 120 |
| median | 138 |
| Q3 | 156 |
| 95-th percentile | 178 |
| Maximum | 202 |
| Range | 142 |
| Interquartile range (IQR) | 36 |
Descriptive statistics
| Standard deviation | 25.46033414 |
|---|---|
| Coefficient of variation (CV) | 0.1861008093 |
| Kurtosis | -0.44824782 |
| Mean | 136.8093682 |
| Median Absolute Deviation (MAD) | 18 |
| Skewness | -0.1443594185 |
| Sum | 125591 |
| Variance | 648.2286144 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 150 | 43 | 4.7% |
| 140 | 41 | 4.5% |
| 120 | 36 | 3.9% |
| 130 | 33 | 3.6% |
| 160 | 25 | 2.7% |
| 110 | 23 | 2.5% |
| 125 | 21 | 2.3% |
| 122 | 20 | 2.2% |
| 170 | 20 | 2.2% |
| 115 | 16 | 1.7% |
| Other values (109) | 640 |
| Value | Count | Frequency (%) |
| 60 | 1 | |
| 63 | 1 | |
| 67 | 1 | |
| 69 | 1 | |
| 70 | 1 | |
| 71 | 1 | |
| 72 | 2 | |
| 73 | 1 | |
| 77 | 1 | |
| 78 | 1 |
| Value | Count | Frequency (%) |
| 202 | 1 | 0.1% |
| 195 | 1 | 0.1% |
| 194 | 1 | 0.1% |
| 192 | 1 | 0.1% |
| 190 | 2 | |
| 188 | 2 | |
| 187 | 1 | 0.1% |
| 186 | 2 | |
| 185 | 4 | |
| 184 | 4 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.0 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 547 | |
| True | 371 |
| Distinct | 53 |
|---|---|
| Distinct (%) | 5.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.8873638344 |
| Minimum | -2.6 |
|---|---|
| Maximum | 6.2 |
| Zeros | 368 |
| Zeros (%) | 40.1% |
| Negative | 13 |
| Negative (%) | 1.4% |
| Memory size | 7.3 KiB |
Quantile statistics
| Minimum | -2.6 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0.6 |
| Q3 | 1.5 |
| 95-th percentile | 3 |
| Maximum | 6.2 |
| Range | 8.8 |
| Interquartile range (IQR) | 1.5 |
Descriptive statistics
| Standard deviation | 1.066570151 |
|---|---|
| Coefficient of variation (CV) | 1.201953595 |
| Kurtosis | 1.203063684 |
| Mean | 0.8873638344 |
| Median Absolute Deviation (MAD) | 0.6 |
| Skewness | 1.022872022 |
| Sum | 814.6 |
| Variance | 1.137571887 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 368 | |
| 1 | 86 | 9.4% |
| 2 | 76 | 8.3% |
| 1.5 | 53 | 5.8% |
| 3 | 28 | 3.1% |
| 1.2 | 26 | 2.8% |
| 0.2 | 22 | 2.4% |
| 0.5 | 19 | 2.1% |
| 1.4 | 18 | 2.0% |
| 1.8 | 17 | 1.9% |
| Other values (43) | 205 |
| Value | Count | Frequency (%) |
| -2.6 | 1 | |
| -2 | 1 | |
| -1.5 | 1 | |
| -1.1 | 1 | |
| -1 | 2 | |
| -0.9 | 1 | |
| -0.8 | 1 | |
| -0.7 | 1 | |
| -0.5 | 2 | |
| -0.1 | 2 |
| Value | Count | Frequency (%) |
| 6.2 | 1 | 0.1% |
| 5.6 | 1 | 0.1% |
| 5 | 1 | 0.1% |
| 4.4 | 1 | 0.1% |
| 4.2 | 2 | 0.2% |
| 4 | 8 | |
| 3.8 | 1 | 0.1% |
| 3.7 | 1 | 0.1% |
| 3.6 | 4 | |
| 3.5 | 2 | 0.2% |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 54.0 KiB |
| Flat | |
|---|---|
| Up | |
| Down |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 3.139433551 |
| Min length | 2 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Up |
|---|---|
| 2nd row | Flat |
| 3rd row | Up |
| 4th row | Flat |
| 5th row | Up |
Common Values
| Value | Count | Frequency (%) |
| Flat | 460 | |
| Up | 395 | |
| Down | 63 | 6.9% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| flat | 460 | |
| up | 395 | |
| down | 63 | 6.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 52.1 KiB |
| 1 | |
|---|---|
| 0 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 1 |
| 3rd row | 0 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 508 | |
| 0 | 410 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1 | 508 | |
| 0 | 410 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Age | Sex | ChestPainType | RestingBP | Cholesterol | FastingBS | RestingECG | MaxHR | ExerciseAngina | Oldpeak | ST_Slope | HeartDisease | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 40 | M | ATA | 140 | 289 | 0 | Normal | 172 | N | 0.0 | Up | 0 |
| 1 | 49 | F | NAP | 160 | 180 | 0 | Normal | 156 | N | 1.0 | Flat | 1 |
| 2 | 37 | M | ATA | 130 | 283 | 0 | ST | 98 | N | 0.0 | Up | 0 |
| 3 | 48 | F | ASY | 138 | 214 | 0 | Normal | 108 | Y | 1.5 | Flat | 1 |
| 4 | 54 | M | NAP | 150 | 195 | 0 | Normal | 122 | N | 0.0 | Up | 0 |
| 5 | 39 | M | NAP | 120 | 339 | 0 | Normal | 170 | N | 0.0 | Up | 0 |
| 6 | 45 | F | ATA | 130 | 237 | 0 | Normal | 170 | N | 0.0 | Up | 0 |
| 7 | 54 | M | ATA | 110 | 208 | 0 | Normal | 142 | N | 0.0 | Up | 0 |
| 8 | 37 | M | ASY | 140 | 207 | 0 | Normal | 130 | Y | 1.5 | Flat | 1 |
| 9 | 48 | F | ATA | 120 | 284 | 0 | Normal | 120 | N | 0.0 | Up | 0 |
Last rows
| Age | Sex | ChestPainType | RestingBP | Cholesterol | FastingBS | RestingECG | MaxHR | ExerciseAngina | Oldpeak | ST_Slope | HeartDisease | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 908 | 63 | M | ASY | 140 | 187 | 0 | LVH | 144 | Y | 4.0 | Up | 1 |
| 909 | 63 | F | ASY | 124 | 197 | 0 | Normal | 136 | Y | 0.0 | Flat | 1 |
| 910 | 41 | M | ATA | 120 | 157 | 0 | Normal | 182 | N | 0.0 | Up | 0 |
| 911 | 59 | M | ASY | 164 | 176 | 1 | LVH | 90 | N | 1.0 | Flat | 1 |
| 912 | 57 | F | ASY | 140 | 241 | 0 | Normal | 123 | Y | 0.2 | Flat | 1 |
| 913 | 45 | M | TA | 110 | 264 | 0 | Normal | 132 | N | 1.2 | Flat | 1 |
| 914 | 68 | M | ASY | 144 | 193 | 1 | Normal | 141 | N | 3.4 | Flat | 1 |
| 915 | 57 | M | ASY | 130 | 131 | 0 | Normal | 115 | Y | 1.2 | Flat | 1 |
| 916 | 57 | F | ATA | 130 | 236 | 0 | LVH | 174 | N | 0.0 | Flat | 1 |
| 917 | 38 | M | NAP | 138 | 175 | 0 | Normal | 173 | N | 0.0 | Up | 0 |